-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guarded Devirt: multiple type checks #59604
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsThis PR extends GDV to support multiple type-checks. For now secondary guesses aren't inlineable yet (but devirtualize-able). using System;
using System.Runtime.CompilerServices;
using System.Threading;
class ClassA
{
public virtual int DoWork() => 42;
}
class ClassB : ClassA
{
public override int DoWork() => 43;
}
class ClassC : ClassB
{
public override int DoWork() => 44;
}
public class Program
{
public static void Main()
{
for (int i = 0; i < 200; i++)
{
Test(new ClassC());
Test(new ClassB());
Test(new ClassA());
Thread.Sleep(10);
}
Console.ReadKey();
}
[MethodImpl(MethodImplOptions.NoInlining)]
static int Test(ClassA a) => a.DoWork();
} Tier1 codegen for ; Assembly listing for method Program:Test(ClassA):int
G_M42821_IG01: ;; offset=0000H
4883EC28 sub rsp, 40
;; bbWeight=1 PerfScore 0.25
G_M42821_IG02: ;; offset=0004H
48B8A80459D6FD7F0000 mov rax, 0x7FFDD65904A8 ; ClassC
483901 cmp qword ptr [rcx], rax
7507 jne SHORT G_M42821_IG04
;; bbWeight=1 PerfScore 4.25
G_M42821_IG03: ;; offset=0013H
B82C000000 mov eax, 44
EB38 jmp SHORT G_M42821_IG09
;; bbWeight=0.37 PerfScore 0.83
G_M42821_IG04: ;; offset=001AH
48B8000359D6FD7F0000 mov rax, 0x7FFDD6590300 ; ClassB
483901 cmp qword ptr [rcx], rax
7508 jne SHORT G_M42821_IG06
;; bbWeight=0.18 PerfScore 0.79
G_M42821_IG05: ;; offset=0029H
FF15611C4100 call [ClassB:DoWork():int:this]
EB21 jmp SHORT G_M42821_IG09
;; bbWeight=0.37 PerfScore 1.85
G_M42821_IG06: ;; offset=0031H
48B8580159D6FD7F0000 mov rax, 0x7FFDD6590158 ; ClassA
483901 cmp qword ptr [rcx], rax
7508 jne SHORT G_M42821_IG08
;; bbWeight=0.18 PerfScore 0.79
G_M42821_IG07: ;; offset=0040H
FF15A21A4100 call [ClassA:DoWork():int:this]
EB0A jmp SHORT G_M42821_IG09
;; bbWeight=0.25 PerfScore 1.25
G_M42821_IG08: ;; offset=0048H
488B01 mov rax, qword ptr [rcx]
488B4048 mov rax, qword ptr [rax+72]
FF5020 call qword ptr [rax+32]ClassA:DoWork():int:this
;; bbWeight=0.01 PerfScore 0.07
G_M42821_IG09: ;; offset=0052H
90 nop
;; bbWeight=1 PerfScore 0.25
G_M42821_IG10: ;; offset=0053H
4883C428 add rsp, 40
C3 ret
;; bbWeight=1 PerfScore 1.25
; Total bytes of code 88, prolog size 4, PerfScore 20.38, instruction count 22, allocated bytes for code 88 (MethodHash=afda58ba) for method Program:Test(ClassA):int
; ============================================================ cc @AndyAyersMS
|
I wonder if we should take this opportunity to clean up how we're using the fields in GT_CALL. What's there now is somewhat hacky (eg we have to save/restore the stub address since it can be "live" for GDV candidates). |
/azp run runtime-coreclr pgo |
/azp run runtime-coreclr pgo |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-coreclr pgo |
Azure Pipelines successfully started running 1 pipeline(s). |
@AndyAyersMS this is ready for review now, it passes all my tests and works on complicated apps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few other questions...
- What does this look like in the inline tree display?
- Are we going to report multiple inline failures to the VM for GDV inline attempts?
If we devirtualize multiple candidates to the same method it would be nice (perhaps) for them to share the call (downside might be we lose exact type in that call).
We should think about what makes sense for chaining here too... perhaps it is only interesting if we're stringing together monomorphic GDVs.
@@ -4795,6 +4806,18 @@ struct GenTreeCall final : public GenTree | |||
IL_OFFSET gtRawILOffset; | |||
#endif // defined(DEBUG) || defined(INLINE_DATA) | |||
|
|||
UINT8 gtGDVCandidatesCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this make a call node bigger?
If we have room for this, do we have enough room to make the union above a tagged union?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AndyAyersMS Not sure I follow, here is the current win-x64 layout (before my changes) in Release /d1reportSingleClassLayoutGenTreeCall
:
4>class GenTreeCall size(144):
4> +---
4> 0 | +--- (base class GenTree)
4> 0 | | genTreeOps gtOper
4> 1 | | var_types gtType
4> 2 | | gtCSEnum
4> 3 | | gtLIRFlags
4> 4 | | AssertionInfo gtAssertionInfo
4> 6 | | _gtCostEx
4> 7 | | _gtCostSz
4> 8 | | _gtRegNum
4> | | <alignment member> (size=3)
4>12 | | GenTreeFlags gtFlags
4>16 | | ValueNumPair gtVNPair
4>24 | | gtRsvdRegs
4> | | <alignment member> (size=4)
4>32 | | gtNext
4>40 | | gtPrev
4> | +---
4>48 | gtCallThisArg
4>56 | gtCallArgs
4>64 | gtCallLateArgs
4>72 | fgArgInfo
4>80 | tailCallInfo
4>80 | CorInfoCallConvExtension unmgdCallConv
4>88 | GenTreeCallFlags gtCallMoreFlags
4>92. | gtCallType (bitstart=0,nbits=3)
4>92. | gtReturnType (bitstart=3,nbits=5)
4> | <alignment member> (size=3)
4>96 | gtRetClsHnd
4>104 | gtCallCookie
4>104 | gtInlineCandidateInfo
4>104 | gtClassProfileCandidateInfo
4>104 | gtStubCallStubAddr
4>104 | compileTimeHelperArgumentHandle
4>104 | gtDirectCallAddress
4>112 | gtControlExpr
4>120 | gtCallMethHnd
4>120 | gtCallAddr
4>128 | CORINFO_CONST_LOOKUP gtEntryPoint
in theory we can fit the count into those spare 3 bits? (means maxvalue is 7) if the size matters that much - not sure about impact on other archs/os
Co-authored-by: Andy Ayers <andya@microsoft.com>
…into gdv-multiple-1
@EgorBo this PR will be closed when you finish all the remaining work items?
|
Right, I need some real-world benchmark to prove it makes sense + most likely I need to implement "Chaining" optimization here, it works today for single-guess GDV but in my PR it's disabled for multi-guess calls so there is some overhead. |
This PR is WIP. |
Closing for now to keep number of open PR smaller, going to return to it |
This PR extends GDV (Guarded Devirtualization) to support multiple type-checks (see #58984 comments ) instead of current "a check for the most popular type and a fallback for everything else".
Plan:
Examples (click to zoom)
No PGO (e.g. .NET 5.0):
PGO (main):
PGO (this PR):
Flowgraph (after
INDXCALL
phase):cc @AndyAyersMS